Report¶

At this point, there are a lot of extra columns included in the csv file, which, however, are omitted. if necessary, they can be used.

In [1]:
# import required packages
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
import matplotlib.pyplot as plt
# import plotly.io as pio
# pio.renderers.default='notebook'
# import plotly.express as px
# Main data
# load data
cols = ['installation_date', 'system_size_DC', 'total_installed_price',
       'rebate_or_grant', 'customer_segment', 'expansion_system','multiple_phase_system', 
         'new_construction', 'tracking', 'third_party_owned', 'installer_name','self_installed',
           'ground_mounted', 'zip_code', 'city','year', 'fips', 'modules', 'technology_module_1',
           'technology_module_2', 'technology_module_3']
df = pd.read_csv('data/ca_sol_fips.csv', usecols=cols, low_memory=False)
df = df.replace(-9999,np.nan)
profile = ProfileReport(df, title="Pandas Profiling Report")
profile
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[1]: